# OCR enhancement
Trocr Ajami
A model focused on converting image content into text information, with wide application value.
Image-to-Text
TensorBoard Other

T
TutlaytAI
139
0
Aya Vision 8b
Aya Vision 8B is an open-weight 8-billion-parameter multilingual vision-language model supporting visual and language tasks in 23 languages.
Image-to-Text
Transformers Supports Multiple Languages

A
CohereLabs
29.94k
282
Internvit 300M 448px
MIT
InternViT-300M-448px is an efficient vision foundation model developed through knowledge distillation from InternViT-6B-448px-V1-5, featuring dynamic input resolution of 448×448 and supporting 1 to 40 patch processing.
Text-to-Image
Transformers

I
OpenGVLab
7,506
57
Idefics2 8b Chatty
Apache-2.0
Idefics2 is an open multimodal model capable of accepting arbitrary sequences of images and text as input and generating text output. The model can answer questions about images, describe visual content, create stories based on multiple images, or function purely as a language model.
Image-to-Text
Transformers English

I
HuggingFaceM4
617
94
Internvit 6B 448px V1 5
MIT
InternViT-6B-448px-V1-5 is a vision foundation model fine-tuned based on InternViT-6B-448px-V1-2, featuring strong robustness, OCR capabilities, and high-resolution processing.
Text-to-Image
Transformers

I
OpenGVLab
155
79
Internvit 6B 448px V1 2
MIT
InternViT-6B-448px-V1-2 is a foundational vision model with a feature backbone, comprising 55.4 million parameters, supporting image processing at 448x448 pixels.
Text-to-Image
Transformers

I
OpenGVLab
19
27
Donut Base Payslips
MIT
Document understanding model based on Donut architecture, specifically fine-tuned for payslip image processing
Text Recognition
Transformers

D
Assadullah
20
0
Trocr Captcha
MIT
This model is an open-source model based on the MIT license, with a CER (Character Error Rate) of 0.0019, indicating high accuracy in specific tasks.
Large Language Model
Transformers

T
tomofi
37
5
Featured Recommended AI Models